Faster Gradient Descent Training of Hidden Markov Models, Using Individual Learning Rate Adaptation
نویسندگان
چکیده
Hidden Markov Models (HMMs) are probabilistic models, suitable for a wide range of pattern recognition tasks. In this work, we propose a new gradient descent method for Conditional Maximum Likelihood (CML) training of HMMs, which significantly outperforms traditional gradient descent. Instead of using fixed learning rate for every adjustable parameter of the HMM, we propose the use of independent learning rate/step-size adaptation, which has been proved valuable as a strategy in Artificial Neural Networks training. We show here that our approach compared to standard gradient descent performs significantly better. The convergence speed is increased up to five times, while at the same time the training procedure becomes more robust, as tested on applications from molecular biology. This is accomplished without additional computational complexity or the need for parameter tuning.
منابع مشابه
Adaptive Back-Propagation in On-Line Learning of Multilayer Networks
An adaptive back-propagation algorithm is studied and compared with gradient descent (standard back-propagation) for on-line learning in two-layer neural networks with an arbitrary number of hidden units. Within a statistical mechanics framework , both numerical studies and a rigorous analysis show that the adaptive back-propagation method results in faster training by breaking the symmetry bet...
متن کاملA new look at discriminative training for hidden Markov models
Discriminative training for hidden Markov models (HMMs) has been a central theme in speech recognition research for many years. One most popular technique is minimum classification error (MCE) training, with the objective function closely related to the empirical error rate and with the optimization method based traditionally on gradient descent. In this paper, we provide a new look at the MCE ...
متن کاملImproving the Convergence of the Backpropagation Algorithm Using Learning Rate Adaptation Methods
This article focuses on gradient-based backpropagation algorithms that use either a common adaptive learning rate for all weights or an individual adaptive learning rate for each weight and apply the Goldstein/Armijo line search. The learning-rate adaptation is based on descent techniques and estimates of the local Lipschitz constant that are obtained without additional error function and gradi...
متن کاملSyllable based DNN-HMM Cantonese Speech to Text System
This paper reports our work on building up a Cantonese Speech-to-Text (STT) system with a syllable based acoustic model. This is a part of an effort in building a STT system to aid dyslexic students who have cognitive deficiency in writing skills but have no problem expressing their ideas through speech. For Cantonese speech recognition, the basic unit of acoustic models can either be the conve...
متن کاملLearning to learn by gradient descent by reinforcement learning
Learning rate is a free parameter in many optimization algorithms including Stochastic Gradient Descent (SGD). Choosing a good value of learning rate is non-trivial for important non-convex problems such as training of Deep Neural Networks. In this work, we formulate the optimization process as a Partially Observable Markov Decision Process and pose the the choice of learning rate per time step...
متن کامل